Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

نویسندگان

Oleksandr Shlakhter

Chi-Guhn Lee

Dmitry Khmelev

Nasser Jaber

چکیده

We propose a general approach to accelerate the convergence of the widely used solution methods of Markov decision processes. The approach is inspired by the monotone behavior of the contraction mappings in the feasible set of the linear programming problem equivalent to the MDP. Numerical studies show that the computational savings can be significant especially in the cases where the standard value iteration suffers from slow convergence. The same acceleration approach can be used in other types of MDPs. This paper sheds light on a new research avenue of combining the two-well studied theories of optimization linear programing and Markov decision processes to hope for surmounting the curse of dimensionality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes

One of the most widely used methods for solving average cost MDP problems is the value iteration method. This method, however, is often computationally impractical and restricted in size of solvable MDP problems. We propose acceleration operators that improve the performance of the value iteration for average reward MDP models. These operators are based on two important properties of Markovian ...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

First Order Markov Decision Processes A Dissertation submitted

Relational Markov Decision Processes (RMDP) are a useful abstraction for complex reinforcement learning problems and stochastic planning problems since one can develop abstract solutions for them that are independent of domain size or instantiation. This thesis develops compact representations for RMDPs and exact solution methods for RMDPs using such representations. One of the core contributio...

متن کامل

Empirical Dynamic Programming

We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get ‘empirical value iteration’ (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get ‘empirical policy iter...

متن کامل

A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

We present a technique for speeding up the convergence of value iteration for par tially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov decision processes (MDPs). The technique can be easily incor porated into any existing POMDP value it eration algorithms. Experiments have been conducted on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Operations Research

دوره 58 شماره

صفحات -

تاریخ انتشار 2010

Acceleration Operators in the Value Iteration Algorithms for Markov Decision Processes

نویسندگان

چکیده

منابع مشابه

Acceleration Operators in the Value Iteration Algorithms for Average Reward Markov Decision Processes

Accelerated decomposition techniques for large discounted Markov decision processes

First Order Markov Decision Processes A Dissertation submitted

Empirical Dynamic Programming

A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes

عنوان ژورنال:

اشتراک گذاری